AITopics | parallel knowledge gradient method

e8f2779682fd11fa2067beffc27a9192-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 22:26:39 GMT

In this analysis, we assume that evaluating the GP prior mean and kernel functions (and the corresponding derivatives) takesO(1)time. For each fantasy model, we need to compute the posterior mean and covariance matrix for the L points (x,w1:L), on which we draw the sample paths. This results in a total cost ofO(KML2)to generate all samples. The SAA approach trades a stochastic optimization problem with a deterministic approximation, which can be efficiently optimized. Suppose that we are interested in the optimization problemminxEω[h(x,ω)].

artificial intelligence, optimization, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Neural Information Processing SystemsNov-21-2025, 14:22:10 GMT

In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.

batch bayesian optimization, name change, parallel knowledge gradient method, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Jian Wu, Peter Frazier

Neural Information Processing SystemsNov-21-2025, 04:48:09 GMT

As a consequence, BO is useful when function evaluation is time-consuming, such as when training and testing complex machine learning algorithms (e.g.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Reviews: The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Neural Information Processing SystemsJan-20-2025, 07:27:09 GMT

The paper is well written and easy to follow. Parallelization of BO is an important subject for practical hyperparameter optimization and the proposed approach is interesting and more elegant than most existing approaches I am aware of. The fact a Bayes-optimal batch is determined is very promising. The authors assume independent normally distributed errors, which is common in most BO methods based on Gaussian processes. However, in hyperparameter optimization this assumption is problematic, since measurements errors represent the difference between generalization performance and empirical estimates (e.g., through cross-validation).

bayesian optimization, benchmark, parallel knowledge gradient method, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.39)

Add feedback

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Wu, Jian, Frazier, Peter

Neural Information Processing SystemsFeb-14-2020, 13:11:58 GMT

In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy. Papers published at the Neural Information Processing Systems Conference.

batch bayesian optimization, batch bayesian optimization algorithm, parallel knowledge gradient method

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Wu, Jian, Frazier, Peter

Neural Information Processing SystemsDec-31-2016

In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: